Adds GCSFS Microbenchmarks #722

jasha26 · 2025-12-15T13:18:09Z

This PR introduces a comprehensive microbenchmark suite for gcsfs to measure the performance of I/O operations under various conditions. The framework is built using pytest and the pytest-benchmark plugin, providing detailed performance metrics for single-threaded, multi-threaded, and multi-process scenarios.

Key Changes:

Microbenchmark Framework: A new microbenchmark suite has been added under gcsfs/tests/perf/microbenchmarks/. It includes:
- A detailed README.md with instructions on how to set up and run the benchmarks.
- An orchestrator script run.py that simplifies the process of running benchmarks and generating reports.
- Pytest fixtures in conftest.py for setting up and tearing down the test environment, including creating temporary files of specified sizes.
Read Benchmarks: The initial implementation focuses on read performance with the following features:
- Patterns: Supports both sequential (seq) and random (rand) read patterns.
- Parallelism: The benchmarks can be run in single-threaded, multi-threaded, and multi-process modes. The number of threads and processes is configurable.
- Configuration: Key parameters such as file size, block size, and chunk size can be easily configured through environment variables.
Configuration and Settings:
- Benchmark parameters are defined in gcsfs/tests/perf/microbenchmarks/read/parameters.py and can be customized using environment variables as defined in gcsfs/tests/settings.py.
- The benchmarks are skipped by default and can be enabled by setting the GCSFS_BENCHMARK_SKIP_TESTS environment variable to false.
Reporting:
- The run.py script generates a timestamped directory containing detailed results in both JSON and CSV formats.
- A summary table is printed to the console for a quick overview of the results, including min/mean latency and max throughput.

For Reviewers:

Please review the overall design of the microbenchmark framework.
The core logic for the read benchmarks is in gcsfs/tests/perf/microbenchmarks/read/test_read.py.
The configuration and parameterization can be reviewed in gcsfs/tests/perf/microbenchmarks/read/configs.py and gcsfs/tests/settings.py.
The README.md file in the microbenchmarks directory should be reviewed for clarity and completeness.
Note that these benchmarks are designed to run against live GCS buckets and are skipped in the standard CI pipeline.

This new benchmark suite will be a valuable tool for a more data-driven performance analysis and to ensure that gcsfs remains highly performant.

* Fixing Block size and consistency options in Extended GCSFS Open (#34) * Separate versioned and non-versioned tests to use different bucket * Update cleanup logic in tests to empty the bucket instead of deleting the bucket * Add consistency and blocksize for ext gcsfs * Merge conflicts resolution * bucket type paramet for fs test for block size and consistency * removed unused mocks variables from some tests * fixing lint errors * fixed small issue with core tests so it can run with exp flag true --------- Co-authored-by: Mahalaxmibejugam <60227368+Mahalaxmibejugam@users.noreply.github.com> * pytest microbenchmarks for seq and random reads both single multi threaded * multiprocess benchmarks * script to run tests * undo settings for bucket names and logs * benchmark script updates * file size and bucket type decorators * file size configuration * removed zonal config * Added README * Readme update * Moving settings and fixture to tests root * Readme update * Readme update * Ignore benchmark pytests in CI * benchmark hook fix * adding skip tests flag * benchmark plugin conditional enablement * Fixing PR Comments, simplifying the configuration by doing auto gen * Fixing PR Comments * default settings --------- Co-authored-by: Mahalaxmibejugam <60227368+Mahalaxmibejugam@users.noreply.github.com>

ankitaluthra1 · 2025-12-16T05:51:28Z

/gcbrun

martindurant

I have very little to say on this - I'm excited to see the results and what improvements your efforts can make!

martindurant · 2025-12-16T15:06:29Z

gcsfs/tests/perf/microbenchmarks/README.md

+
+## Introduction
+
+This document describes the microbenchmark suite for `gcsfs`. These benchmarks are designed to measure the performance of various I/O operations under different conditions. They are built using `pytest` and the `pytest-benchmark` plugin to provide detailed performance metrics for single-threaded, multi-threaded, and multi-process scenarios.


Style: comments are harder with these long lines, recommends sticking to 80chr limits.

I would also test against concurrent.interpreter since it is now available in py3.14, and also free-threaded if all the upstream dependencies support it.

concurrent.interpreter is something i will take in the future PRs

martindurant · 2025-12-16T15:07:38Z

gcsfs/tests/perf/microbenchmarks/README.md

+You can install them using pip:
+
+```bash
+pip install -r gcsfs/tests/perf/microbenchmarks/requirements.txt


Many are using tools like uv and others to specify the dependencies and run command in a more declarative way rather than descriptive README text like this. Might be worth thinking about.

i have added dependencies to env file as well for conda run to work OOB, i guess for non conda scenarions we will have to live with requirements for now. This is something we can revisit in the future.

martindurant · 2025-12-16T15:14:01Z

gcsfs/tests/perf/microbenchmarks/conftest.py

+    benchmark_plugin_installed = False
+
+
+@pytest.fixture


Session scope?

We want this fixture to create new files on every test run, hence keeping it functions scoped. If we create files once per session, then the results will include caching done by GCS which we don't want here.

martindurant · 2025-12-16T15:15:43Z

gcsfs/tests/perf/microbenchmarks/conftest.py

+    def wrapper():
+        base_cases = base_cases_func()
+        new_cases = []
+        for case in base_cases:


Can't this kind of thing be done with parametrize?

This is now moved to yaml, so this is no longer the case

martindurant · 2025-12-16T15:17:15Z

gcsfs/tests/perf/microbenchmarks/conftest.py

+    return wrapper
+
+
+def _get_bucket_name_for_type(bucket_type: str) -> str:


This is just a dictionary :)

Moved to a map in conftest so it can be reused in other tests also in the future

martindurant · 2025-12-16T15:17:53Z

gcsfs/tests/perf/microbenchmarks/conftest.py

+    return wrapper
+
+
+def with_threads(base_cases_func: Callable) -> Callable:


Also seems like code duplication, suggests these generator functions can be factored.

The generators are now removed, moved configuration to a yaml and converted the yaml values to parameters

martindurant · 2025-12-16T15:23:09Z

gcsfs/tests/perf/microbenchmarks/read/configs.py

+@with_bucket_types(["regional", "zonal"])
+@with_file_sizes
+@with_threads
+@with_processes


Perhaps it doesn't matter, since this will only get written once, but decorators are less easy to understand that simple sequential code. This ends up working like parametrized - which is fine and a well established pattern. But I wonder if it could have been written simpler within the function.

Same as above, removed the decorators and moved to yaml based config

martindurant · 2025-12-16T15:24:26Z

gcsfs/tests/perf/microbenchmarks/requirements.txt

@@ -0,0 +1,7 @@
+# Dependencies for running the performance benchmarks
+gcsfs


Should this refer to the local directory explicitly? We are assuming the user has already installed gcsfs from source.

Done, thanks for the feedback.

* Fixing Block size and consistency options in Extended GCSFS Open (#34) * Separate versioned and non-versioned tests to use different bucket * Update cleanup logic in tests to empty the bucket instead of deleting the bucket * Add consistency and blocksize for ext gcsfs * Merge conflicts resolution * bucket type paramet for fs test for block size and consistency * removed unused mocks variables from some tests * fixing lint errors * fixed small issue with core tests so it can run with exp flag true --------- Co-authored-by: Mahalaxmibejugam <60227368+Mahalaxmibejugam@users.noreply.github.com> * pytest microbenchmarks for seq and random reads both single multi threaded * multiprocess benchmarks * script to run tests * undo settings for bucket names and logs * benchmark script updates * file size and bucket type decorators * file size configuration * removed zonal config * Added README * Readme update * Moving settings and fixture to tests root * Readme update * Readme update * Ignore benchmark pytests in CI * benchmark hook fix * adding skip tests flag * benchmark plugin conditional enablement * Fixing PR Comments, simplifying the configuration by doing auto gen * Fixing PR Comments * default settings * added resource monitoring for benchmarks * minor refactoring * moved config to yaml * lint fixes * config yaml update for full read suite * undo zonal file logging changes * simplify single threaded read * bringing back requirements * update readme * csv generation fix when some tests fail * psutil install in cloudbuild * psutil install in cloudbuild * Removing zonal conditional code and updating config --------- Co-authored-by: Mahalaxmibejugam <60227368+Mahalaxmibejugam@users.noreply.github.com>

ankitaluthra1 · 2025-12-23T08:54:33Z

/gcbrun

ankitaluthra1 · 2025-12-23T14:46:43Z

/gcbrun

* Fixing Block size and consistency options in Extended GCSFS Open (#34) * Separate versioned and non-versioned tests to use different bucket * Update cleanup logic in tests to empty the bucket instead of deleting the bucket * Add consistency and blocksize for ext gcsfs * Merge conflicts resolution * bucket type paramet for fs test for block size and consistency * removed unused mocks variables from some tests * fixing lint errors * fixed small issue with core tests so it can run with exp flag true --------- Co-authored-by: Mahalaxmibejugam <60227368+Mahalaxmibejugam@users.noreply.github.com> * pytest microbenchmarks for seq and random reads both single multi threaded * multiprocess benchmarks * script to run tests * undo settings for bucket names and logs * benchmark script updates * file size and bucket type decorators * file size configuration * removed zonal config * Added README * Readme update * Moving settings and fixture to tests root * Readme update * Readme update * Ignore benchmark pytests in CI * benchmark hook fix * adding skip tests flag * benchmark plugin conditional enablement * Fixing PR Comments, simplifying the configuration by doing auto gen * Fixing PR Comments * default settings * added resource monitoring for benchmarks * minor refactoring * moved config to yaml * lint fixes * config yaml update for full read suite * undo zonal file logging changes * simplify single threaded read * bringing back requirements * update readme * csv generation fix when some tests fail * psutil install in cloudbuild * psutil install in cloudbuild * Removing zonal conditional code and updating config * parallel file creation in setup * merge conflicts * merge conflicts - lint fixes * lint issues --------- Co-authored-by: Mahalaxmibejugam <60227368+Mahalaxmibejugam@users.noreply.github.com>

ankitaluthra1 · 2025-12-24T14:11:47Z

/gcbrun

ankitaluthra1 · 2025-12-24T14:23:31Z

/gcbrun

ankitaluthra1 · 2025-12-26T14:37:13Z

/gcbrun

martindurant reviewed Dec 17, 2025

View reviewed changes

Merge branch 'fsspec:main' into main

c251094

Merge branch 'fsspec:main' into main

895ce22

Merge branch 'fsspec:main' into main

c6a5406


		## Introduction

		This document describes the microbenchmark suite for `gcsfs`. These benchmarks are designed to measure the performance of various I/O operations under different conditions. They are built using `pytest` and the `pytest-benchmark` plugin to provide detailed performance metrics for single-threaded, multi-threaded, and multi-process scenarios.

		return wrapper


		def _get_bucket_name_for_type(bucket_type: str) -> str:

		return wrapper


		def with_threads(base_cases_func: Callable) -> Callable:

		@@ -0,0 +1,7 @@
		# Dependencies for running the performance benchmarks
		gcsfs

Adds GCSFS Microbenchmarks #722

Are you sure you want to change the base?

Adds GCSFS Microbenchmarks #722

Uh oh!

Conversation

jasha26 commented Dec 15, 2025

Key Changes:

For Reviewers:

Uh oh!

ankitaluthra1 commented Dec 16, 2025

Uh oh!

martindurant left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ankitaluthra1 commented Dec 23, 2025

Uh oh!

ankitaluthra1 commented Dec 23, 2025

Uh oh!

ankitaluthra1 commented Dec 24, 2025

Uh oh!

ankitaluthra1 commented Dec 24, 2025

Uh oh!

ankitaluthra1 commented Dec 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants